Persistence in High dimensional linear predictor-selection and the virtue of over-parametrization
نویسنده
چکیده
Let Z = (Y , X 1, ..., X i m), i = 1, ..., n, be i.i.d. random vectors, Z ∼ F, F ∈ F . It is desired to predict Y by βjXj , where (β1, ..., βm) ∈ B ⊆ R, under a prediction loss. Suppose that m = n, α > 1, i.e., there are many more explanatory variables than observations. We consider sets B restricted by the maximal number of non-zero coefficients of their members, or by their l1 radius. We study the following asymptotic question: How ‘large’ may the set B be, so that it is still possible to select empirically a predictor whose risk under F is close to that of the best predictor in the set. Sharp bounds for orders of magnitudes are given under various assumptions on F . Algorithmic complexity of the ensuing procedures is also studied. The main message of this paper and the implications of the above derived orders are that under various sparsity assumptions on the optimal predictor there is “asymptotically no harm” in introducing many more explanatory variables than observations. Furthermore, such practice can be beneficial in comparison with a procedure that screens in advance a small subset of explanatory variables. Another main result is that ’Lasso’-type procedures, i.e., optimization under l1 constraint, could be efficient in finding optimal sparse predictors in high dimensions. Running head: Persistence and Predictor Selection.
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملThe Control Parametrization Enhancing Technique for Multi-Objective Optimal Control of HIV Dynamic
In this paper, a computational approach is adopted for solving a multi-objective optimal control problem (MOOCP) formulation of optimal drug scheduling in human immunodeficiency (HIV) virus infected by individuals. The MOOCP, which uses a mathematical model of HIV infection, has some incompatible objectives. The objectives are maximizing the survival time of patients, the level of D...
متن کاملThree Dimensional Non-linear Radiative Nanofluid Flow over a Riga Plate
Numerous techniques in designing zones happen at high temperature and functions under high temperature are in a way that involves non-linear radiation. In weakly conducting fluids, however, the currents induced by an external magnetic field alone are too small, and an external electric field must be applied to achieve an efficient flow control. Gailitis and Lielausis, devised Riga plate to gene...
متن کاملComparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data
Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004